READ ME FILE

Project Title: Attainment of Global Diabetes Targets: A Pooled Analysis of Individual-Level Data from Nationally Representative Surveys in 100 Low-, Middle-, and High-Income Countries

Author: Grace Chung (sygrace@umich.edu)

This repository contains
1. Stata do-file `HPACC cleaning.do`, which prepares harmonized population-based survey data for the analysis of diabetes care targets in low-, middle-, and high-income countries.
2. Step 1 and Step 2 R scripts

1. Stata do-file `HPACC cleaning.do`
## Overview

The do-file performs the following tasks:

### 1. Load, Clean, and Append Datasets

The following datasets are loaded, cleaned, and appended:
- `HPACC LMIC Pt 1`
- `HPACC LMIC Pt 2` (merged with `H_LASI_DAD_b1.dta` using `prim_key`)
- `HPACC HIC`
- `HPACC DUA` (merged with `DEGS1_0285_v4 - with p_id.dta` using `germany p_id`)
- `Aruba STEPS`
- `Armenia STEPS`

### 2. Appended Dataset Cleaning

- Drop ineligible countries
- Clean missingness in key variables

### 3. Generate Analysis Variables

- BMI categorical variable
- Region indicator
- Clinical diabetes variable

### 4. Apply Sample Restrictions

The dataset is restricted to observations with non-missing:
- Sampling weights
- Primary sampling unit (PSU)
- Stratum
- Clinical diabetes status

### 5. Define Analytic Samples

We define separate analytic samples for each outcome:

| Outcome               |  Sample Criteria                                                                     |
|-----------------------|--------------------------------------------------------------------------------------|
| Diagnosed             | Ages 30 69, non-pregnant, with clinical diabetes and non-missing sex, BMI, education |
| Glycemic & BP Control | Ages 30 69, non-pregnant, diagnosed diabetes, non-missing sex, BMI, education        |
| Statin Use            | Ages 40 69, non-pregnant, diagnosed diabetes, non-missing sex, BMI, education        |

### 6. Create Outcome Variables

Outcome indicators are generated for:
- Diagnosed diabetes
- Glycemic control
- Blood pressure (BP) control
- Statin use

These are created for both main and sensitivity analyses.

### 7. Merge Population Estimates

- Load and clean World Population Prospects (WPP) 2021 estimates
- Merge population estimates into the combined dataset

### 8. Weighting and Standardization

- Generate population weights for the 30 69 and 40 69 age groups
- Apply WHO age standardization
- Rescale outcome-specific weights:
  - Diagnosed, glycemic control, BP control ? 30 69 population
  - Statin use ? 40 69 population
- Assign zero weights to individuals outside the analytic sample to contribute to variance but not weighted estimates
- Generate crude population weights from `w3`, rescaled to the 30 69 population for sensitivity analyses

## Requirements

- Stata 15 or higher
- Access to the HPACC datasets and WPP 2021 estimates

## Usage

To run the cleaning process, open Stata and execute:

```stata
do "HPACC cleaning.do"


2. R scripts
## Overview
Using the HPACC_Maindata_appended.dta file generated by the Stata do file, the Step 1 R script is run for model fitting and posterior prediction, and the Step 2 R script is run for survey-weighted post-estimation.

## Step 1 R script performs the following tasks for each target outcome:

--Data Subset and Preparation
     --Restrict to observations where sampling = 1.
     --Restrict to complete cases with non-missing values for all covariates used in the model.
     
--Model Fitting
     --Specify and fit six Bayesian logistic regression models using the brms package.
     
--Model Selection
     --Use Leave-One-Out cross-validation (LOO) to compare model fits.
     --Select a single best-performing model specification to apply across all outcomes.
     
--Prepare Data for Post-Estimation Analysis
  For sampling = 1:
     --Store 2,000 posterior draws from the selected model.
     --Include relevant grouping variables, stratum, and diabetes status.
  For sampling = 0:
     --Include stratum and PSU.
     --Set crude and WHO-standardized population weights to zero; these records are excluded from post-estimation but retained for correct variance estimation.
     
## Step 2 R script processes posterior draws to produce survey-weighted estimates:

--Posterior Draw Selection
     --Subsample 100 posterior draws by selecting every 20th draw from the full set of 2,000.

--Survey-Weighted Estimation
     --For each grouping variable (overall, country, region, income group, age, sex, BMI category, education), 
     		Restrict to sampling = 1 and complete cases.
     		Use the survey package to compute the mean and variance for each of the 100 draws.

--Summary of Results
     --Compute the overall mean estimate.
     --Report means and 95% confidence intervals by country, region, and income group.
     --For sex, age, BMI, and education groups, report:
		Group mean and confidence interval
		Absolute difference from reference group (with confidence interval)
		Ratio relative to reference group (with confidence interval)

## Comparability
This analysis was developed using the following software versions:
- R version 4.4.0 (2024-04-24)
- Survey package (for survey-weighted estimation): 4.4.2
- brms package (for Bayesian modeling): 2.21.0